物理信息神经网络(PINN)的自适应训练方法需要专门的构造,以分配每个训练样本分配的权重分布。有效地寻求这种最佳的权重分布并不是一项简单的任务,大多数现有方法基于近似值的全部分布或最大值选择自适应权重。在本文中,我们表明,用于训练效率的样品自适应选择中的瓶颈是数值残差的尾巴分布的行为。因此,我们提出了剩余的定量调整(RQA)方法,可为每个训练样本提供更好的体重选择。最初将权重设置与剩余的$ p $ th功率成正比之后,我们的RQA方法重新分配了所有高于$ q $ - Quantile(例如$ 90 \%$)的所有权重,以便中位数,因此权重遵循分数 - 从残差得出的调整分布。借助迭代的重新加权技术,RQA也非常易于实现。实验结果表明,所提出的方法可以在各种偏微分方程(PDE)问题上胜过几种自适应方法。
translated by 谷歌翻译
我们提出了一种强化学习(RL)方法来计算准静止分布的表达。基于准静止分布的定点配方,我们最大限度地减少了候选分布引起的两个马尔可夫路径分布的KL分配和真正的目标分布。通过梯度下降来解决这一具有挑战性的最小化问题,我们通过引入相应的奖励和价值函数来应用增强学习技术。我们派生了相应的政策梯度定理和设计演员 - 批评算法,以了解最佳解决方案和价值函数。测试有限状态马尔可夫链的数值例子以展示新方法
translated by 谷歌翻译
引入了最小二乘神经网络(LSNN)方法,用于求解[6,5]中的标量线性和非线性双曲线保护定律。该方法基于等效的最小二乘(LS)公式,并采用Relu神经网络作为近似函数,特别适合近似于具有未知界面位置的不连续函数。在用于HCLS的LSNN方法的设计中,差异操作员的数值近似起着关键作用,并且沿坐标方向的标准数值或自动差异通常会导致基于NN的失败方法。为了克服这一困难,本文以空间和时间的差异形式重写了HCLS,并引入了新的离散分散操作员。从理论上讲,即使解决方案是不连续的,也可以估计离散发散操作员的精度。从数值上讲,对新离散发散操作员的结果LSNN方法进行了测试,该方法是否在凸和非凸通量的几个基准问题上进行了测试。该方法能够计算出正确的物理解决方案,以解决稀疏波的问题,并在没有振荡或涂抹的情况下捕获基本问题的冲击。
translated by 谷歌翻译
我们介绍了具有不连续解决方案的线性前进反应问题的最小二乘释放的方法,并表明该方法在自由度的数量方面优于基于网格的数值方法。本文研究了标量非线性双曲胁迫法的LSNN方法。该方法是在具有Relu激活功能的神经网络功能集中的等效最小二乘(LS)配方的离散化。通过使用数值集成和保守的有限体积方案来完成对LS功能的评估。一些测试问题的数值结果表明,该方法能够通过Relu神经网络的释放线自由断路来近似底层问题的不连续接口。此外,该方法不沿着不连续界面展示普通的GIBB现象。
translated by 谷歌翻译
Recent investigations on rotation invariance for 3D point clouds have been devoted to devising rotation-invariant feature descriptors or learning canonical spaces where objects are semantically aligned. Examinations of learning frameworks for invariance have seldom been looked into. In this work, we review rotation invariance in terms of point cloud registration and propose an effective framework for rotation invariance learning via three sequential stages, namely rotation-invariant shape encoding, aligned feature integration, and deep feature registration. We first encode shape descriptors constructed with respect to reference frames defined over different scales, e.g., local patches and global topology, to generate rotation-invariant latent shape codes. Within the integration stage, we propose Aligned Integration Transformer to produce a discriminative feature representation by integrating point-wise self- and cross-relations established within the shape codes. Meanwhile, we adopt rigid transformations between reference frames to align the shape codes for feature consistency across different scales. Finally, the deep integrated feature is registered to both rotation-invariant shape codes to maximize feature similarities, such that rotation invariance of the integrated feature is preserved and shared semantic information is implicitly extracted from shape codes. Experimental results on 3D shape classification, part segmentation, and retrieval tasks prove the feasibility of our work. Our project page is released at: https://rotation3d.github.io/.
translated by 谷歌翻译
With the attention mechanism, transformers achieve significant empirical successes. Despite the intuitive understanding that transformers perform relational inference over long sequences to produce desirable representations, we lack a rigorous theory on how the attention mechanism achieves it. In particular, several intriguing questions remain open: (a) What makes a desirable representation? (b) How does the attention mechanism infer the desirable representation within the forward pass? (c) How does a pretraining procedure learn to infer the desirable representation through the backward pass? We observe that, as is the case in BERT and ViT, input tokens are often exchangeable since they already include positional encodings. The notion of exchangeability induces a latent variable model that is invariant to input sizes, which enables our theoretical analysis. - To answer (a) on representation, we establish the existence of a sufficient and minimal representation of input tokens. In particular, such a representation instantiates the posterior distribution of the latent variable given input tokens, which plays a central role in predicting output labels and solving downstream tasks. - To answer (b) on inference, we prove that attention with the desired parameter infers the latent posterior up to an approximation error, which is decreasing in input sizes. In detail, we quantify how attention approximates the conditional mean of the value given the key, which characterizes how it performs relational inference over long sequences. - To answer (c) on learning, we prove that both supervised and self-supervised objectives allow empirical risk minimization to learn the desired parameter up to a generalization error, which is independent of input sizes. Particularly, in the self-supervised setting, we identify a condition number that is pivotal to solving downstream tasks.
translated by 谷歌翻译
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.
translated by 谷歌翻译
The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.
translated by 谷歌翻译
The high feature dimensionality is a challenge in music emotion recognition. There is no common consensus on a relation between audio features and emotion. The MER system uses all available features to recognize emotion; however, this is not an optimal solution since it contains irrelevant data acting as noise. In this paper, we introduce a feature selection approach to eliminate redundant features for MER. We created a Selected Feature Set (SFS) based on the feature selection algorithm (FSA) and benchmarked it by training with two models, Support Vector Regression (SVR) and Random Forest (RF) and comparing them against with using the Complete Feature Set (CFS). The result indicates that the performance of MER has improved for both Random Forest (RF) and Support Vector Regression (SVR) models by using SFS. We found using FSA can improve performance in all scenarios, and it has potential benefits for model efficiency and stability for MER task.
translated by 谷歌翻译
A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are not well-separated (e.g., the points in $X$ and $Y$ may be ``intermingled''). Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly, i.e., with computational complexity $O(m)$ or $O(n)$ for a fixed accuracy or rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.
translated by 谷歌翻译